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Abstract 

This  paper  proposes  some  tests  for  parameter  constancy  in  linear  regression 
models  with  possible  infinite  variance.  Both  dynamic  and  trending  regressors  are 
allowed.  The  tests  are  based  on  the  empirical  distribution  function  of  estimated 
residuals  and  are  shown  to  have  non-trivial  local  power  against  a  wide  range  of 
alternatives.  Within  a  certain  class  of  alternatives  including  simple  shifts,  the 
tests  have  higher  power  for  testing  the  simple  shift  alternatives.  These  tests  are 
formulated  in  such  a  way  that  the  limiting  variables  are  distribution-free.  The 
residuals  may  be  obtained  based  on  any  root-n  consistent  estimator  (under  the 
null)  of  regression  parameters.  As  part  of  these  results,  some  weak  convergence 
for  weighted  sequential  empirical  processes  of  residuals  is  established. 

Key  words  and  phrases:  structural  change,  empirical  distribution  function, 
weak  convergence,  nonparametric  test,  fluctuation  test,  CUSUM  test. 

1      Introduction 

There  are  various  sources  in  economics  that  could  cause  a  parametric  model  to  be 
unstable  over  a  period  of  time.  Changes  in  taste,  technical  progress,  and  changes 
in  policies  and  regulations  all  are  such  examples.  A  change  in  the  economic  agent's 
expectation  can  induce  a  change  in  the  reduced-form  relationship  among  economic 
variables,  even  though  no  change  in  the  parameters  of  the  structural  relationship  is 
present,  as  envisioned  by  the  Lucas  critique.    The  shifts  of  the  Phillips  curve  over 
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time  perhaps  serve  as  the  best  illustration  (Alogoskoufis  and  Smith,  1991).  As  a 
result,  model  stability  has  always  been  an  important  concern  in  econometric  modeling. 
Earlier  studies  of  parameter  constancy  include  Chow  (1960)  and  Quandt  (1960).  As 
perhaps  a  consequence  of  diagnostic  failures,  models  capable  of  handling  parameter 
instability  have  constantly  spawned  out.  The  random- coefficients  model  of  Cooley  and 
Prescott  (1973),  for  instance,  the  switching-regression  models  of  Goldfeld  and  Quandt 
(1973a,b)  and  numerous  others  find  widespread  use  in  economics.  The  purpose  of  this 
paper  is  to  provide  additional  tools  for  the  diagnosis  of  parameter  instability  in  linear 
regressions. 

Recent  work  in  econometrics  on  this  topic  has  been  directed  toward  detecting 
parameter  changes  occurring  at  an  unknown  time,  hardly  a  new  problem  given  the 
large  body  of  related  literature  in  econometrics  and  statistics.  Econometricians  are 
particularly  concerned  with  parameter  instability  in  dynamic  models  with  trending 
regressors,  cointegrated  variables  and  perhaps  a  unit  root,  and  with  serially  correlated 
or  heteroskedastic  disturbances.  Various  test  statistics  that  are  capable  of  detecting 
changes  in  those  situations  have  been  developed;  see,  for  example,  Andrews  (1990), 
Chu  and  White  (1992),  Hansen  (1992),  Perron  (1991),  and  Ploberger,  Kramer  and 
Kontrus  (1989).  Empirical  applications  together  with  supporting  theory  can  be  found 
in  Bai,  Lumsdaine  and  Stock  (1991),  Banerjee,  Lumsdaine  and  Stock  (1992),  Chris- 
tiano  (1992),  Perron  (1989),  and  Zivot  and  Andrews  (1992),  among  others.  In  this 
paper,  we  propose  some  tests  able  to  detect  structural  instability  for  some  of  these 
models.  In  addition,  these  tests  are  applicable  to  infinite  variance  regressions. 

Two  classes  of  tests  are  proposed,  resembling  the  prototypical  Kolmogorov-Smirnov 
two-sample  test.  The  first  class  is  based  on  non- weighted  sequential  empirical  processes 
of  residuals.  This  class  was  previously  considered  by  Picard  (1985)  and  Csorgo  and 
Horvath  (1988),  among  others.  However,  these  authors  only  consider  the  case  of  i.i.d. 
observations  under  the  null.  We  extend  the  tests  to  apply  to  regression  models  with 
estimated  parameters. 

The  first  class  of  tests  has  limited  applicability  in  time  series  econometrics  since 


if  trending  regressors  are  included  in  the  regression  model,  the  tests  will  no  longer 
be  asymptotically  distribution-free.  In  this  case,  the  second  class  of  tests  can  be 
considered,  obtained  by  constructing  a  weighted  empirical  process  of  residuals  with 
weights  equal  to  the  regressors,  apart  from  some  weighting  matrices.  This  class  of 
tests  is  asymptotically  distribution-free  whether  or  not  a  trend  regressor  is  present. 
Our  procedure  may  be  regarded  as  nonparametric,  yet  it  is  not  fully  nonparametric 
in  light  of  the  need  to  estimate  the  regression  parameters.  By  way  of  construction, 
our  tests  are  robust  against  heavy-tailed  distributions  and  data  aberrations.  Recent 
work  of  Carlstein  (1988)  and  Duembgen  (1991),  who  consider  the  estimation  of  the 
shift  point  under  the  single  shift  alternative  (two  i.i.d.  samples)  and  obtain  good 
convergence  rate,  indicates  that  the  tests  of  the  Kolmogorov-Smirnov  type  may  be 
powerful. 

The  classical  statistical  literature  shows  that  goodness- of- fit  tests  based  on  empir- 
ical processes  involving  estimated  parameters  will  depend  upon  both  the  estimated 
parameters  and  the  underlying  error-distribution  function  even  in  the  limit  (see  Durbin 
(1973)).  It  is  somewhat  surprising  that  we  can  eliminate  this  dependence  by  choosing 
weighting  vectors,  in  a  natural  way,  in  the  construction  of  the  empirical  processes  upon 
which  our  tests  are  based,  whereas  classical  goodness-of-fit  tests  can  be  made  asymp- 
totically distribution-free  merely  by  essentially  abandoning  part  of  the  observations 
(see  Durbin  (1976)). 

The  tests  proposed  in  this  paper  are  quite  general  in  the  sense  that  we  require  no 
finite  variance  for  the  disturbances.  Both  dynamic  and  trending  regressors  are  allowed 
in  the  regressions.  Within  certain  classes  of  alternatives,  we  show  the  tests  to  be  more 
powerful  when  used  for  testing  simple  shift  alternatives,  as  expected.  Moreover,  these 
tests  exhibit  non-trivial  local  power. 

As  a  related  result  that  may  be  of  independent  interest,  a  weak  convergence  for 
randomly-weighted  sequential  empirical  processes  has  been  obtained.  We  then  use  this 
result  to  obtain  the  weak  convergence  for  its  counterpart  for  the  regression  residuals, 
laying  the  theoretical  foundation  of  our  tests. 


This  paper  is  organized  as  follows.  Section  2  specifies  the  models  and  describes 
the  assumptions.  Section  3  defines  the  test  statistics.  Section  4  provides  alternative 
expressions  for  the  test  statistics  that  are  suitable  for  computation.  Section  5  examines 
the  local  power  of  the  tests.  Trending  regressors  are  considered  in  Section  6.  Some 
comments  and  possible  extensions  are  discussed  in  Section  7.  Technical  materials  are 
collected  in  the  appendix. 

2      Models  and  Assumptions 

The  regression  model  under  the  null  hypothesis  is 

yt  =  x't/3  +  et      (<  =  l,2,...,n)  (1) 

where  yt  is  an  observation  of  the  dependent  variable,  xt  is  a  p  x  1  vector  of  observations 
of  the  independent  variables,  et  is  an  unobservable  stochastic  disturbance,  and  ft  is 
the  p  x  1  vector  of  regression  coefficients. 

The  non-null  hypothesis  specifies  the  following  model: 

Vt  =  x'tPt  +  el. 

where  the  /3t  may  not  be  constant  over  time  and  the  disturbances  e*  may  not  be  iden- 
tically distributed.  In  particular,  we  are  interested  in  the  following  local  alternatives. 

i)  Changing  regression  parameters:  j3t  =  (3(1  +  Aig(2/n)n-1'2). 

ii)  Changing  variance:  e*  =  e((l  +  A2/i(2/n)n-1/2). 

iii)  Both  i)  and  ii). 
where  Ai  and  A  2  are  two  real  numbers;  the  functions  h  and  g  are  assumed  to  be 
bounded. 

In  what  follows,  the  notation  op(l)  (Op(l))  is  used  to  denote  a  sequence  of  random 
variables  converging  to  zero  in  probability  (being  stochastically  bounded).  The  norm 
||  •  ||  represents  the  Euclidean  norm,  i.e.  ||x||  =  (Ya=ix1Y^2  f°r  x  €  -Rp-  Finally,  [•] 
denotes  the  greatest  integer  function. 

We  impose  the  following  assumptions: 


(A.l)  Under  the  null  hypothesis,  the  et  are  i.i.d.  with  distribution  function  (d.f.)  F, 

which  admits  a  density  function  /,  /  >  0.    Both  f(x)  and  xf(x)  are  assumed  to  be 

uniformly  continuous  on  the  real  line.    Furthermore,  there  exists  a  finite  number  L 

such  that  \xf(x)\  <  L  and  |/(x)|  <  L  for  all  x.   The  mean  of  et  is  zero  if  this  mean 

exists. 

(A. 2)  The  disturbances  et  are  independent  of  all  contemporaneous  and  past  regressors. 

(A. 3)  The  regressors  satisfy 

plim—  >J  xtx\  —  SQ     f°r  s  G  [0, 1] 
nt=i 

where  Q  is  a  p  x  p  nonrandom  positive  definite  matrix.  The  convergence  is  necessarily 

uniform  in  s,  because  the  sum  is  "monotonic"  in  s. 

(A.4) 

max  n_1/2||xt||  =  op(l) 

l<t<n 

(A. 5)  For  every  fixed  Si,  there  exists  a  sequence  of  positive  random  variables  Zn  = 
Op(l)  such  that 

i        \ns] 


n  4    r       l 


J2    WXtW  -  (5  ~  5i)Zn        a. 5. 

=[nsi] 

for  all  s  >  Si.  In  addition,  the  tail  probability  of  Zn  satisfies,  for  some  p  >  0: 

P(\Zn\  >  C)  <  M/C2(1+p). 

Note  that  Zn  may  be  taken  to  be  max^  A;-1  J2\=i  \\xt\\  provided  the  condition  on  the 
tail  probability  is  also  satisfied,  where  i  =  [nsi]  is  fixed. 

(A. 6)  There  exist  7  >  1,  a  >  1  and  K  <  00  such  that  for  all  0  <  s'  <  s"  <  1,  and  for 
all  n, 

-  V  £(x;xt)7  <-A'(5"  -  5')    and     E{-  Y\  x[xtf  <  K(s"  -  s')°, 
n   r~i  ■  n   ~i 

:<<<J  i<t<j 

where  i  =  [ns'],  j  =  [ns"].  The  assumption  is  satisfied  if  the  xt  are  bounded  regressors. 
Also  if  E(x'txt)2  <  M  for  all  2,  then  the  assumption  is  satisfied  with  7  =  2  and  a  =  2, 
because  E(jyt=i  x'txt)2  <  {Ylt^i^i^t)2]1^2}2  by  the  Cauchy-Schwartz  inequality. 


(A.7) 

{X'Xfl20  -  0)  ±  0P(1) 

where  X  =  (a^,^,  ...,x„)'.  When  the  disturbances  are  i.i.d.  and  have  finite  variance, 
then  least  squares  estimator  satisfies  this  assumption.    For  infinite  variance  models, 
robust  estimation  such  as  LAD  method  has  to  be  used  to  assure  (A.7). 
(A. 8)  There  exist  a  8  >  0  and  an  M  <  oo  such  that 


£(-£lM3(1+{))  <  M  and  £(-£lNI3)1+5  <  M    Vn. 


"  ~  n  (=1 


(A.9)  Finally 


i  tnsJ 
plim—  y^  xt  =  sx     uniformly  in  5  £  [0, 1] 


nt=i 


where  x  is  a  p  x  1  constant  vector.  When  a  constant  regressor  is  included,  (A.9)  is 
implied  by  (A. 3). 

Assumptions  (A. 3)  and  (A.9)  exclude  trending  regressors,  which  will  be  discussed 
in  Section  5. 

3      The  Test  Statistics 

Let  $  be  an  estimator  of  0  and  put  it  =  yt  —  x't0.  The  test  is  based  on  the  estimated 
residuals  £t.  For  each  fixed  k,  define  the  empirical  distribution  function  (e.d.f.)  based 
on  the  the  first  k  residuals  as 

K  t=\ 
and  the  e.d.f.  based  on  the  last  n  —  k  residuals  as 

P:_k(x)  =  -J—  £  i(it  <  x). 

n        K  t=k+l 


Further  define 


n  n  n 

and  the  test  statistic 


Tn(-,x)  =  -(1  -  *)Vn  (Pk(x)  -  P:_k(x)) 

n  tj  .r>.  >  ' 


Mn  =  max  sup  \Tn(k/n,x) 

k  x 


where  the  max  is  taken  over  1  <  k  <  n  and  the  supremum  with  respect  to  x  is 
taken  over  the  entire  real  line.  For  each  fixed  k,  the  supremum  of  Tn  with  respect  to 
the  second  argument  gives  the  weighted  Kolmogorov-Smirnov  two-sample  test  with 
weights  [(k/n)(l  —  Jc/n)]1?2.  Thus  the  test  Mn  looks  for  the  maximum  value  of  weighted 
Kolmogorov-Smirnov  statistics  for  all  possible  sample  splits. 
We  have  the  following  identities: 

Tn(-,x)     =    n-^J2l(et<x)--n-^±I(it<x)  (2) 

n  t=i  n  t=\ 

=    n-il2Y,{^t<x)-F{x)}--n-l'2J2{I{et<x)-F{x).}     (3) 

Writing  in  the  form  (3)  will  be  convenient  for  studying  the  limiting  distribution  of  Tn 
and  hence  of  Mn. 

As  will  be  shown,  the  test  Mn  has  non-trivial  local  power  against  changes  in  the 
scale  parameter  of  the  disturbances.  However,  like  the  CUSUM  test,  it  has  no  local 
power  against  shifts  in  the  regression  parameters  if  the  mean  regressor  is  zero.  To  cir- 
cumvent this  undesirable  feature,  we  introduce  a  new  class  of  tests  based  on  weighted 
e.d.f.  of  residuals.  Let  Xk  =  (xi, ...,  x^)'  and 

Ak  =  {X'X)-ll2{X'kXk){X'X)-1l2.  (4) 

Define  the  p  x  1  vector  process  T*, 

rn*(-,x)  =  (x'xy1'2J2Mi(£t<x)-Fn(x)} 
n  t=i 

-Ak{X'X)-ll2Yjxt{I{et<x)-Fn{x)}  (5) 


t=\ 


and  the  test  statistic 


k 

Mn  =rnaxsup||Tn*(-,x) 
k      x  n 


where  ||y||oo  =  max{|y1|, ...,  |yp|},  the  maximum  norm.   The  process  T*  and  test  M* 
reduce  to  Tn  and  Mn,  respectively,  when  the  weights  xt  =  1  for  all  t. 
If  there  is  a  constant  regressor,  then  the  following  identity  holds, 

(X'X)-1'2  £  xt  -  Ak(X'X)-^2  £  ^  =  0,       V  k  (6) 

t=i  t=i 


so  that  the  value  of  T*(k/n,x)  will  not  change  when  Fn(x)  in  (5)  is  replaced  by  an 
arbitrary  function  of  x.  In  particular,  T*  can  be  written  as 

(X'X)-^  £  x't{I(it  <x)~  F(x)}  -  MX'X)-^  £  x't{I(it  <x)-  F(x)}.      (7) 

Equation  (7)  is  a  weighted  version  of  (3).  This  expression  cannot  be  used  to  compute 
the  test  A/*,  as  F(x)  is  unknown;  however,  it  will  be  useful  in  deriving  the  limiting 
process  of  T*.  When  computing  the  test,  one  should  omit  Fn(x)  if  a  constant  regressor 
is  included.  However,  whether  or  not  there  is  a  constant  regressor,  the  two  expressions 
for  T*  (5)  and  (7)  have  the  same  null  limiting  process,  because  n1^2{Fn(x)  —  F(x)}  = 
0P{\)  uniformly  in  x,  a  well  known  result  for  residual  e.d.f.  (Shorack  and  Wellner 
1986,  Chapter  4),  and 

n-^{(X'X)-^2  J>  -  Ak{X'X)-"2  £  xt}  =  op(l) 

uniformly  in  k  by  assumptions  (A. 3)  and  (A. 9).  Finally,  we  remark  here  that  if  none 
of  the  regressors  are  trending,  then  we  may  substitute  the  scalar  k/n  for  the  matrix 
Ak. 

Let  B(u,  v)  be  a  Gaussian  process  on  [0,  l]2  with  zero  mean  and  covariance  function 

E{B(s,u)B(t,v)}  =  (min(s,£)  —  st)(m'm(u,v)  —  uv), 

which  we  shall  call  a  two-parameter  Brownian  bridge  on  [0,  l]2.  In  what  follows, 
the  notation  "  =$■  "  is  used  to  denote  the  weak  convergence  in  the  space  of  D(T) 
or  D(T)  x  D(T)  x  ■  •  •  x  D(T)  where  T  =  [0,  l]2  under  the  (extended)  Skorohod  J\ 
topology. 

Theorem  1    Under  model  (1)  and  assumptions  (A.1)-(A.9), 

(i)  Tn([^-,-)^B(-,F(-)) 

n 

and 


(ii)  T„*(— ,■)=»/?•(-,  F(-)) 

n 


where  B"   —  (B\,  B2, ...,  Bv)'  is  a  vector  of  p  independent  two-parameter  Brownian 
bridges  on  [0,  l]2. 

Let  G(-)  denote  the  d.f.  of  the  r.v.  sup0<u<1  sup0<v<1  \B(u,v)\,  which  is  tabulated 
in  Picard  (1985). 

Corollary  1    Under  the  assumptions  of  Theorem  1, 

\imP(Mn  <x)  =  G(x) 

and 

lim  P(M*  <x)  =  [G(x)}p. 

n— *oo 

The  proof  of  the  theorem  is  based  on  the  limiting  behavior  of  the  process  A'*, 

I<:(s,x)  =  (X'X)-^2f^xt{I(et  <  x)  -  F(x)} 

t=i 

which  we  shall  call  the  weighted  sequential  empirical  process  of  residuals  (w.s.e.p.). 
Note 

T:([^,z)  =  K:(s,z)-A[ns]i<:(i,z). 

Introduce 

Hn(s,x)  =  {X'X)-ll2Y,xt{I{et  <x)-  F(x)} 

t=i 

which  is  a  non-residual  version  of  w.s.e.p.  Theorem  A. 2  in  the  Appendix  implies  that 
A'*(s,x)  can  be  written  as 

Hn(s,  x)  +  f(x)(X'X)-^(X{ns]X[ns])(P  -  fi)  (8) 

plus  an  op(l)  term  which  is  uniformly  small  in  both  s  and  x.  The  second  term  above 
is  identical  to  the  corresponding  term  of  A[ns]K*(l,x),  so  that 

T*(s,x)  =  Hn(s,x)  -  A[ns]Hn(l,x)  +  op(l). 

Corollary  A. 2  in  the  Appendix  gives  the  limiting  process  of  T*. 
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The  limiting  process  of  K*,  if  it  exists,  will  depend  on  the  limiting  distribution 
of  the  estimated  parameters  and  on  the  error  density  function  /,  as  is  easily  seen 
from  (8).  However,  parameter  estimation  does  not  affect  the  limiting  process  of  T*. 
The  fact  that  the  limiting  process  of  T*  depends  on  F  rather  than  /  allows  us  to 
construct  distribution-free  tests.  The  sup-type  test,  for  example,  transforms  out  this 
dependence  on  F.  Further,  if  the  error  et  has  a  symmetric  distribution  about  zero,  so 
that  .F(O)  =  1/2,  then  tests  based  on  T*(s,0)  will  also  be  asymptotically  distribution- 
free. 

Besides  the  sup-type  tests,  the  mean-type  test  can  also  be  used.  Let 

An  =  ^EEl^,li)|2     and    K  =  ^ttK&i)\\2- 

k       J  k       j 

The  result  of  Theorem  1  implies  that  An  converges  in  distribution  to  /J  /J  B(s,  t)2ds  dt 
and  A*  converges  in  distribution  to  /0  /0  Y7i=\Bi(s,t)2dsdt,  where  B\,...,BP  are  in- 
dependent copies  of  B(-,-).  Many  other  tests  can  be  constructed  based  on  the  weak 
convergence  of  Theorem  1. 

The  weak  convergence  of  empirical  processes  based  on  estimated  residuals  has 
been  studied  by  many  authors,  see,  for  example,  Mukantseva  (1978),  Boldin  (1982, 
1989),  Pierce  and  Kopecky  (1982),  and  Kreiss  (1991).  It  appears  that  Koul  (1970) 
is  among  the  first  who  studied  weighted  empirical  processes  and  followed  by  Withers 
(1972).  Weighted  empirical  processes  of  residuals  have  been  studied  by  Koul  (1984, 
1991).  Shorack  and  Wellner  (1986)  give  more  references  on  residual  empiricals.  The 
weak  convergence  for  the  sequential  version,  which  is  essential  for  the  structural  change 
problem,  has  not  been  widely  examined:  Bai  (1991)  considered  the  sequential  empirical 
process  for  ARMA  residuals. 

4      Computing  the  tests 

We  now  derive  some  alternative  expressions  for  the  test  statistics  Mn  and  M*  that 
are  suitable  for  programmed  computation.    We  shall  focus  on  M*;  the  test  Mn  is  a 
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special  case.  Now  for  each  fixed  k,  \\T*(-, a:)||oo  can  only  possibly  change  its  value 
at  £i,£2)  •  ■  •  ?£n  when  x  varies,  therefore,  the  maximum  value  with  respect  to  x  can 
be  found  at  x  =  £;  (i  =  1,2,  ...,n)  or  equivalently,  at  x  =  £(,-)  (i  =  1,2,  ...,n),  where 
£(,)  is  the  i-th  ordered  statistic.  Let  Ri,R.2,  ...,Rn  denote  the  ranks  of  £i,£2, . . .  ,£n 
and  D\^D2i----,Dn  denote  the  anti-ranks  so  that  Zfo,  =  D#t  =  i.  For  a  fixed  j,  let 
us  evaluate  T£(-,x)  at  x  =  £(_,).  First  assume  there  is  a  constant  regressor,  so  that 
T*  is  equivalent  to  the  expression  (5)  with  Fn(x)  =  0  omitted  due  to  (6).  Since 
I3"=i  xtl{£t  5:  £(j))  is  the  sum  of  those  vectors  Xj  such  that  it  is  not  larger  than  era,  it 
can  be  written  as  Ya=\  xd,-  Similarly,  J2t=i  xtl{£t  <  £(j))  =  X],=i  xDxI{Di  <  k).  Thus 
if  we  define  a  sequence  of  numbers  Z%,k  (t  —  1,2,  ...,n)  such  that 


Zt,k 


1      for  i  =  1,2, ...,  k 

0     for  t  =  k  +  l,k  +  2,...,n 


Then  Zd,,A:  =  1  if  and  only  if  D{  <  k.  Thus 


rn*(-,£(i))  =  (i'i)-1/2  E^.^.a  -  (a^X*'*)"1  5>* 


77 


,t=l 


1=1 


and 


m: 


max  max 

k         3 


(X'X)-^    ^{ZD„kI  -  (X'kXk)(X'X)-l}xD, 


,t=i 


(9) 


(10) 


where  /  is  the  p  x  p  identity  matrix.  Taking  Xf  =  1  in  the  above  formula  for  all  2,  we 
obtain 


1 
Mn  =  max  max  —= 

k        i     y/n 


YlZDt,k 


i=i 


k 


77. 


yielding  an  easily  computable  formula,  see  similar  formula  in  Hajek  (1969,  p.  62- 
63).  When  there  is  no  constant  regressor,  the  expression  (9)  has  to  be  adjusted  by 
subtracting  the  left  hand  side  of  (6)  multiplied  by  j'/n,  which  is  the  product  of  the 
left  hand  side  of  (6)  and  Fn(i^)).  The  test  M*  is  adjusted  accordingly  and  Mn  stays 
the  same.1 


:A  SAS  program  for  computing  the  statistics  is  available  upon  request. 
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5      Local  Power  Analysis 

We  consider  model  (2)  with  the  Kramer,  Ploberger,  and  Alt  (1988)  (henceforth  KPA) 
type  local  alternatives: 

f3t  =  P  +  A1g(t/n)n-1/2      and      e*  =  et(l  +  A2h(t/n)n-1/2)-1 .  (11) 

where  et  are  i.i.d.  with  distribution  function  F  and  density  function  /.  The  function 
g  and  h  are  defined  on  [0, 1]  and  are  integrable.  Define  the  vector  function 

Ms)  =  /  9{v)dv  -s  J  g{v)dv  (12) 

and  the  function 

Xh(s)  =  [Sh(v)dv-s  [  h(v)dv.  (13) 

Jo  Jo 

If  h  is  a  simple  shift  function  such  that  h(x)  =  0  for  x  <  r  and  h(x)  =  1  for  x  >  r, 
where  r  £  (0, 1),  then  \h{s)  =  (r  A  s)(l  —  r  V  s).  Similar  is  true  for  As. 

Theorem  2    Under  assumptions  (A.1)-(A.9)  and  the  local  alternatives  (11),  we  have 

Mn  -i   sup     sup  \B{s,t)  +  AlP(t)x'Xg{s)  +  A2q(t)Xh(s)\  (14) 

0<s<l    0<<<1 

and 

MZ^   sup     sup  \\B*{s,t)  +  AlP{t)Q^2Xg{s)  +  A2q(t)Q-1/2x\h{s)\\^  (15) 

0<s<l   0<<<1 

where  p{t)  =  f(F-l(f))  and  q(t)  =  /(F^ityF^it). 

The  tests  have  nontrivial  local  power  as  long  as  Xg(s)  ^  0  or  Xh(s)  ^  0  for  some  s.  In 
addition,  Xg  =  X^  =  0  for  all  s  if  and  only  if  g  and  h  are  constant  functions,  implying 
no  change  in  the  parameters. 

Corollary  2  (Changing  regression  parameters  only).  Under  the  assumptions  of  The- 
orem 2, 

Mn  -i   sup    sup  \B(s,t)  +  Aip(t)x'Xg(s)\, 

0<s<\   0<t<l 


Mn*-i   sup    sup  \\B*{Slt)  +  AlP{t)Q1/2Xg(s) 


d 

a<s<\  o<t<\ 
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The  corollary  is  obtained  by  simply  taking  A2  =  0  in  Theorem  2.  In  testing  for  changes 
in  the  regression  parameters,  Mn  behaves  like  the  CUSUM  test  of  Brown,  Durbin, 
and  Evans  (1975)  in  the  sense  of  lacking  local  power  when  the  mean  of  regressors  x  is 
orthogonal  to  the  vector  function  g,  as  shown  by  KPA.  The  test  M*,  however,  does 
have  local  power  irrespective  of  the  relationship  between  x  and  g.  Thus  it  behaves  like 
the  fluctuation  test  of  PKK.  The  drift  term  is  also  similar  in  form  to  the  fluctuation 
test. 

There  is  a  danger  of  misinterpreting  the  result  of  Corollary  2.  Let  us  examine  the 
following  model  under  the  alternative  hypothesis: 

yt  =  x't(3  +  Ag{t/n)n-l/2  +  et,  (16) 

where  g  is  a  scalar  function.  This  model  would  be  a  special  case  of  (11)  provided 
there  is  a  constant  regressor,  but  for  now  assume  there  is  no  constant  regressor  and 
the  mean  regressor  x  is  zero.  Then  it  is  M*  not  Mn  has  no  local  power,  which  seemingly 
contradicts  Corollary  2.  This  situation  arises  because  there  is  change  in  the  parameter 
of  a  regressor  that  is  not  considered  under  the  null.  To  see  this,  we  consider  a  more 
general  situation: 

yt  =  x'tf3  +  Az'tg{t/n)n-1^  +  et  (17) 

where  zt  is  q  x  1  and  g  is  a  vector  function.  The  xt  are  the  only  regressors  under  the 
null  of  A  =  0.  Suppose  that  n_1  YJt=\  zt  ~~*  sz  an^  n~1  Ht=i  xtzt  ~ *  sRxz  uniformly  in 
s  where  Rxz  is  some  p  x  q  matrix.  Then 

Mn  -i  sup    sup  \B{s,t)  +  Ap(t)z'Xg(s)\  (18) 

0<s<l   0<t<l 


Mn*^sup    sup  ||B*(a,t).+.Ap(t)g-1/2i2r,A,(a>||00,  (19) 


d 

0<a<\  0<t<l 

Now  let  zt  =  1,  so  that  (17)  reduces  to  (16).    Moreover,  2  =  1  and  Rxz  —  x.    Thus 
if  the  mean  value  of  the  regressor  is  zero,  M*  has  no  local  power  but  Mn  does.    Of 
course,  for  zt  =  xt,  (18)  and  (19)  coincide  with  Corollary  2. 
Now  taking  Ai  =  0  in  Theorem  2  yields: 
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Corollary  3   (Changing  scale  only).    Under  the  assumptions  of  Theorem  2, 


d 


Mn^   sup     sup  \B(s,t)  +  A2q(t)Xh(s)\, 

0<s<l    0<«1 


M'n^   sup     sup  \\B*(s,t)  +  A2q(t)Q-1^x\h(s) 


d 

0<s<l   0<t<l 


When  testing  for  a  shift  in  the  scale  parameter,  the  situation  is  reversed  from  the 
test  of  a  shift  in  regression  parameters;  M*  has  no  local  power  if  the  regressor  mean, 
x,  is  zero  whereas  Mn  has  local  power  irrespective  of  this  mean  value. 

In  summary,  the  test  Mn  has  non-trivial  local  power  when  testing  for  changes  in 
the  disturbances  regardless  of  the  mean  value  of  the  regressors.  The  test  M*  has  non- 
trivial  local  power  when  testing  for  changes  in  the  regression  parameter  /3,  regardless 
of  the  angle  between  the  regressor  and  the  structural  shift,  see  KPA  for  comparison. 
Moreover,  Mn  also  has  local  power  for  testing  changes  in  (3  and  M*  also  has  local 
power  for  testing  changes  in  the  disturbances,  except  for  some  special  circumstances 
discussed  earlier.  Whereas  the  conventional  CUSUM  test  only  has  local  power  against 
changes  in  regression  parameters  and  CUSUM-SQ  test  only  has  local  power  against 
heteroskedasticity,  see  Ploberger  and  Kramer  (1990).  It  is  not  clear  how  the  fluctuation 
test  performs  for  a  change  in  the  disturbances,  as  it  is  not  intended  to  be  used  for  this 
purpose. 

The  test  statistics  Mn  and  M*  are  more  powerful  when  used  for  testing  the  simple 
shift  alternatives.  To  fix  ideas,  consider  the  scale  change  alternatives  as  in  Corollary 
3.  Let  Ha  be  the  set  of  functions  h  defined  by 


Ha  =  {h;  0  <  h  <  1,    /   h(v)dv  =  1  -  a} 

Jo 


for  some  a  satisfying  0  <  a  <  1.  The  number  1  —  a  represents  on  average  the  deviation 
of  h  from  0. 

Consider  the  test  statistic  Mn.  Since  B(s,t)  is  uniformly  bounded  in  probabil- 
ity, the  value  of  Mn  is  mainly  determined  by  the  drift  term  A.2q{t)\h{s),  for  large 
|A2|.  Thus  with  high  probability,  in  order  for  Tn(s,x)  to  be  maximized,  the  following 
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quantity  needs  to  be  maximized  with  respect  to  s 


■  /    h(v)dv  —  /    h(v)dv 
Jo  Jo 


(20) 


We  determine  the  h  G  Ha  and  s  £  [0, 1]  that  maximize  the  objective  function  (20).  It 
is  easy  to  show  that  there  are  two  set  of  solutions,  depending  on  whether  the  quantity 
inside  the  absolute  value  sign  is  positive  or  negative.  One  solution  is  given  by 

7 ./   x        f  0       if  v  <  a  ,„„, 

h*(v)  =  I  ~  (21) 

v  '       I    1       it  v  >  a  v     ' 

and  5*  =  a.  The  other  solution  is  given  by 

1       if  v  <  1  -  a 


h*(v) 


0       if  v  >  1  —  a 


and  s*  =  1  —  a.  But  both  of  these  h*  imply  a  simple  shift  alternative,  so  the  test  is 
more  powerful  against  a  simple  shift.  Furthermore,  the  value  of  the  objective  function 
evaluated  at  the  optimal  solution  is  a(l  —  a)  in  both  cases.  But  a{\  —  a)  is  maximized 
for  a  =  1/2,  implying  a  higher  power  for  detecting  a  shift  that  occurs  near  the  middle 
of  the  observations. 

To  see  (21)  is  a  solution,  consider  the  objective  function  (20)  without  the  absolute 
value  sign.  For  each  fixed  s,  since  the  second  term  is  non-positive,  (20)  will  be  max- 
imized by  choosing  h(v)  =  0  for  v  <  s.  The  objective  function  becomes  s  Js  h(v)dv 
with  fs  h(v)dv  =  (1  —  a).  To  maximize  the  objective  function,  one  needs  to  choose  s  as 
large  as  possible.  In  order  to  choose  the  largest  s  such  that  /s  hdv  =  1  —  a,  one  needs 
to  choose  h  as  large  as  possible.  Thus  h(v)  =  1.  The  constraint  becomes  f^dv  =  l  —  a 
which  implies  s*  =  a.  The  second  solution  is  obtained  by  maximizing  the  negative 
function  inside  the  absolute  value  sign. 

6      Trending  Regressors 

We  consider  the  following  model: 

yt  =  z'ta  +  7o  +  n(t/n)  +  ...  +  lq{t/n)q  +  et  (22) 
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where  zt  is  a  r  x  1  vector  of  stochastic  regressor  and  {zs;  s  <  t  —  1}  are  independent 
of  St.  Let  xt  =  (z't,  l,t/n, ...,  (t/n)q)'  be  a  p  x  1  vector,  with  p  =  r  +  q  +  1. 

The  polynomial  trends  {(i/n)';l    <   i   <  q}  could  be  written  without  dividing 
through  by  n.    Writing  in  this  fashion  saves  notations  by  eliminating  the  weighting 
matrix  such  as  diag(n-1/2,  ...,n~^q+1^2)  that  would  otherwise  be  needed.    We  shall 
maintain  all  assumptions  (A.1)-(A.8)  of  Section  1,  except  changing  (A. 3)  to 
(A.3')     . 

I   [ns]  ^        M 

plim—  22  xtx't  =  hm  —  &  /J  xtx't  =  Q(s)i       uniformly  in  s  €  [0, 1] 
nt=i  n     t=i 

where  Q(s)  is  positive  definite  for  s  >  0  and  Q(0)  —  0.  If  each  element  of  Q(s)  is 
a  continuous  function  on  [0,1],  then  one  can  show  that  pointwise  convergence  in  s 
implies  uniform  convergence  in  s.  Assumption  (A.3')  actually  admits  a  much  wider 
class  of  models  than  (22). 

In  the  presence  of  trending  regressor  only  the  weighted  version,  M*,  is  asymp- 
totically distribution-free.  We  shall  assume  that  there  is  a  constant  regressor.  Let 
Xk  =  (x'1,x'2....,x'ky  and  and  define  Ak  as  in  (4)  and  T*  as  follows: 

rn*(-,  x)  =  {X'X)-1'2  J2  xtI(it  <  x)  -  Ak(X'X)-^2  J2  xtI(it  <x).        (23) 
n  t=l  t=l 

Again  let  M*  =  max^  supx  HT^^/rZjx)!^.  The  computation  of  M*  is  given  by  (10). 
Note  that  A[ns]  ^  A(s)  =  Q{l)-^2Q(s)Q(l)-1'2  uniformly  in  s. 

Theorem  3    Under  assumptions  (A.1-A.8)  with  (A.3)  replaced  by  (A.31),  we  have 

where  B*(s,u)  is  a  vector  Gaussian  process  defined  on  [0,  l]2  with  zero  mean  and 
covariance  matrix 

E{B*(r,  u)B*(s,  v)'}  =  {A{r  A  s)  -  A(r)A(s)}{u  Av-uv}. 

Corollary  4    Under  assumptions  of  Theorem  3, 

M*n±     sup    HS-^tOHoo. 

0<s,u<l 
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The  behavior  of  the  test  under  the  local  alternatives  (11)  can  again  be  analyzed. 
Extending  Lemma  4  of  KPA,  we  can  show  that 

-jr,xtx'tg(t/n)±  jSd^g{v)dv  (24) 

n  "  Jo       av 

and  the  convergence  is  uniform  in  s.     The  above  integral  exists  if  g  has  bounded 

variation  on  [0,1].  When  Q(v)  =  vQ(l),  (24)  reduces  to  the  result  of  KPA.  Let 

w  Jo       av  Jo       av 

where  e  =  (1,0,  ...,0)'. 

Theorem  4    Under  the  local  alternative  (11), 

M;A    sup    \\B*(S,u)+p(u)A1Q(l)-^X;(s)  +  q(u)A2Q(l)-^2Xl(s)\\00 

0<s,u<l 

where  p(-)  and  q(-)  are  given  in  Theorem  2. 

Of  course,  when  Q(v)  =  vQ(l),  the  theorems  and  corollaries  in  previous  sections 
can  be  derived  from  the  results  of  this  section.  However,  the  limiting  distribution 
obtained  here  is  generally  regressor-dependent,  so  critical  values  of  the  tests  have  to 
be  found  case  by  case,  though  leading  cases  can  be  tabulated.  Also,  the  result  of  this 
section  requires  the  existence  of  a  constant  regressor. 

Tests  allowing  trending  regressors  have  been  proposed  by  MacNeill  (1978),  Sen 
(1980),  Kim  and  Siegmund  (1989),  Hansen  (1992),  Chu  and  White  (1992),  Perron 
(1991),  Vogelsang  (1992).  Those  tests  are  connected  with  the  partial  sums,  the  likeli- 
hood ratio,  or  Wald-type  statistics.  It  remains  to  be  studied  how  the  proposed  tests 
in  this  paper  perform  relative  to  those  in  the  literature. 

7      Some  Comments 

We  have  maintained  the  assumption  that  the  disturbances  {et}  are  independent  r.v.'s. 
This  could  be  extended  to  ARMA  models.  Although  further  extension  to  more  general 
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dependence  structure  such  as  mixing  is  also  possible  in  terms  of  weak  convergence, 
critical  values  of  the  tests  are  difficult  to  obtain  because  the  limiting  process  will  have 
a  complicated  correlation  structure.2  For  the  ARMA  models  such  as  et  =  B(L)ut, 
where  B(L)  is  a  ratio  of  two  polynomials  of  the  lag  operator  L,  one  can  still  estimate 
ut.  This  could  be  done  with  a  two-step  procedure.  The  first  step  involves  estimating 
the  regression  coefficients  0  and  the  second  step  estimating  the  coefficients  of  B(L) 
using  the  first  step  residuals.  The  two-step  procedure  yields  estimates  of  ut  from 
which  empirical  distribution  function  can  be  constructed.  Although  details  remain  to 
be  worked  out,  the  results  of  the  previous  sections  is  expected  to  hold.  Bai  (1991) 
obtained  similar  results  for  pure  ARMA  models  for  the  test  Mn. 

The  tests  proposed  in  this  paper  are  similar  in  form  to  the  fluctuation  test  of  PKK. 
In  fact,  consider  regressing  I(it  <  x)  on  xt  for  t=l,  ....,  k,  and  write 

e^(x)  =  (x'kxk)-1Y/xti(it<x). 

Using  the  expression  (23)  for  T*,  one  obtains, 

T;(k/n,  x)  =  Ak{X'Xfl2  (§W(x)  -  #">(*))  . 

The  quantity  0^k'(x)  can  be  considered  to  be  an  estimate  of  Q~1xF(x)  using  a  partial 
sample,  and  6^T\x)  using  the  whole  sample.  The  test  here  has  one  more  dimension 
than  PKK's,  and  thus  can  be  viewed  as  a  two-dimensional  fluctuation  test.  Notice 
that  PKK's  test  does  not  include  trending  regressors  but  T*  does.  This  comparison 
also  suggests  a  way  to  extend  PKK's  test  to  trending  regressors.  Simply  replace  t/T 
in  their  notation  by  the  matrix  At  and  let 

Sn  =  max  o-l\\Ak{X'X)ll\^  _  ^H)^ 

p<K<n 

where  a  and  fiW  (£  =  p^  ..#)  n)  are  defined  in  PKK.  Then  it  is  not  difficult  to  show 

Sn  -»     SUp     ||B(s)||oo 
0<s<l 


2There  is  a  large  literature  on  empirical  process  based  on  mixing  sequences;  see  the  early  study 
of  Billingsley  (1968,  p.  240)  and  the  recent  study  of  Andrews  and  Pollard  (1990). 
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where  B  is  mean  zero  vector  Gaussian  process  with  covariance  matrix 

E{B(u)B(v)'}  =  A(u  At;)-  A(u)A(v). 

Other  issues  that  are  left  unaddressed  in  this  paper  include  cointegrated  regressors, 
and  size  and  power  comparisons  with  other  tests  in  the  literature. 
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A      Appendix 

In  view  of  the  mathematical  structure  of  parameter  changes  in  regression  models,  we 
shall  first  present  and  prove  a  series  of  results  concerning  the  weak  convergence  of 
weighted  sequential  empirical  processes.  These  results  are  of  independent  interest  and 
will  be  used  subsequently  in  proving  the  theorems  stated  in  the  body  of  the  paper. 

Let  D*[0, 1]  be  the  set  of  functions  /  =  (/1?  ...,/p)  defined  on  [0, 1]'  that  are  right 
continuous  and  have  left  limits.  Endowed  with  the  extended  Skorohod  J\  topology, 
Z)'[0, 1]  is  a  separable  and  complete  metric  space,  so  that  finite  dimensional  conver- 
gence plus  tightness  implies  weak  convergence  for  a  sequence  of  random  elements  of 
ZMO,  1];  see  Bickel  and  Wichura  (1971).  The  space  D^O,  1]  for  p  =  q  =  1  is  extensively 
studied  by  Billingsley  (1968). 

Theorem  A.l  Let  U\,  U2,  ■  ■  ■  ,Un  be  a  sequence  of  i.i.d.  uniformly  distributed  random 
variables  on  [0,1]  and  x,  (i  =  1,2,  ...,n)  be  a  sequence  of  random  vectors  satisfying 
assumptions  (A. 5)  and  (A. 6).  Assume  that  U{  is  independent  of  Xj  for  j  <  i.  Then 
the  process  Yn(s,u)  defined  as 

[»] 
Yn(s,u)  =  n-l'2Y,Xt{I{Ut  <u)-u) 
t=\ 

with  Yn(0,u)  =  Yn(s,0)  =  0  is  tight  in  D%[0, 1]. 

Remarks:  The  process  Yn  is  a  multivariate  and  multiparameter  process.  The  require- 
ment of  uniform  distribution  is  only  for  convenience.  The  theorem  holds  for  arbitrary 
i.i.d.  random  variables  £t.  In  this  case,  I(Ut  <  u)  —  u  is  replaced  by  I(et  <  x)  —  F(x) 
where  F  is  the  distribution  function  of  et.  Then  Yn(s,u)  (with  u  =  F(x))  is  tight.  In 
addition,  the  i.i.d.  assumption  on  [/,■  can  be  relaxed  to  a  triangular  array  such  that 
Un\,  ■■■,Unn  are  independent  variables  on  [0,1]  with  Uni  having  a  d.f.  Fm-  such  that 
maxi<,<„  |Fm(u2)  —  Fnj(ui)|  <  C|u2  —  u\\,  where  C  is  generic  constant.  This  claim 
can  be  easily  seen  from  the  proof. 
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Lemma  A.l  Assume  the  conditions  of  Theorem  A. I  hold.  Then  there  exists  K  <  oo, 
such  that  for  all  Si  <  s2  and  u\  <  u2,  where  0  <  s,,  u,  <  1  (i  =  1,2) 

E\\Yn(s2,u2)  -  Yn{suu2)  -  Yn(s2,Ul)  +  Yn(suUl)\\2^ 

<  K{u2  -  ux)a{s2  -  sx)a  +  n-^-^K(u2  -  Ul)(s2  -  sa). 

Without  the  loss  of  generality,  one  can  assume  that  a  <  7,  since  \u2  —  «i|  <  1  and 
1-52  —  S\\  <  1.  Moreover,  when 

T  n-(7-l)/2(a-l)   <  U2  _  Ul         and     T  n-(7-D/2(a-l)  <  ^  _  ^  (25) 

for  r  >  0,  then  the  lemma  implies 

E\\Yn(s2,u2)  -  Yn(Sl,u2)  -  Yn(s2,Ul)  +  Yn(suUl)\\^ 

<  K[\  +  t-2^-^}(u2  -  Ul)a(s2  -  Sl)a.  (26) 

This  inequality  is  analogous  to  (22.15)  of  Billingsley  (1968,  p.  198). 
Proof.  Write  vt  =  I(ux  <  Ut  <  u2)  —  u2  +  u^  and  Y*  =  Yn(s2,u2)  -  yn(si,u2)  - 
Yn(s2,Ui)  +  Yn(si,ui)  for  the  moment.  Then  Y*  =  n~1'2  J2i<t<j  xtilt  with  i  =  [nsi] 
and  j  =  [TIS2].  Note  that  {xt7/t,^}  is  a  sequence  of  (nonstationary)  vector  martin- 
gale differences,  where  J-\  is  the  u-field  generated  by  ...,Xt,  xt+i; ...,  Ut-u  Ut-  By  the 
inequality  of  Rosenthal  (Hall  and  Hedye,  1980  p.  23),  there  exists  a  constant  M  <  00 
only  depending  on  7  and  p  such  that 

{  \n  '«<J  i<h<j  j 

<    ME  I-  £  E{(x[xt)r)^t-i})    +Mn-"  J2  E{(x'txtr^}.(27) 

Note  that  xt  is  measurable  with  respect  to  Tt-i  and  nt  is  independent  of  Tt-\-  In 
addition,  Erj1  <  u2  —  U\  and  Ent'r  <  u2  —  U\.  These  results  together  with  assumption 
(A. 6)  provide  bounds  for  the  two  terms  on  the  right  of  (27).  The  first  term  is  bounded 

M(u2  -  u^E  (-  £  (x'txt)\     <  MK(u2  -  Uly(s2  -  Sl)a 

\U  i<*<3  I 
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and  the  second  term  is  bounded  by 

Mn-^-^ua  -  ui)-  £  ^W  <  M  Kn^-^  (u2  -  Ul)(s2  -  a,). 

Renaming  M/"f  as  if,  the  lemma  then  follows  from  (u2  —  Ui)7  <  (u2  —  Ui)a,  for  7  >  a. 

Lemma  A. 2    Under  (A. 5),  we  have  for  S\  <  s  <  s2  and  U\  <  u  <  u2, 

\\Yn(s,  u)  -  Yn(Sl,  Ul)\\  <  \\Yn(s2,  u2)  -  Yn(suUl)\\  +  Op(l)n^2[(u2  -  Ul)  +  (s2  -  s,)] 

where  the  term  0P{\)  is  uniform  in  s  (s  >  S\),  does  not  depend  on  u  and  u\  and 
satisfies 

P{\Op(l)\>C)<M/C2^+p\       VC>0,     for  some  P  >  0. 

Proof.  First  notice  that  all  components  of  xt  can  be  assumed  to  be  nonnegative. 
Otherwise  write  xt  —  Y%=ivt{i)  —  Yfi-\xT{i)  where  xf(i)  =  (0,  ..0,  xti,  0,  ...,0)'  if 
xti  >  0  and  x?(i)  =  (0,  ..0,  —  xti,0,  ...,0)'  if  it,  <  0.  In  this  way,  Yn  can  be  written  as  a 
linear  combination  (with  coefficients  1  or  -1)  of  at  most  2p  processes  with  each  process 
having  nonnegative  weighting  vectors.  In  addition,  ||xt"(i)||  <  ||o;(||  and  ||z^~(0ll  — 
\\xt\\.  So  assumptions  (A. 5)  and  (A. 6)  are  satisfied  for  xf(i)  and  xj~(i).  It  is  thus 
enough  to  assume  that  the  xt  are  nonnegative.  A  new  piece  of  notation,  for  vectors 
a  and  6,  take  a  <  b  to  mean  a,  <  6,  for  all  components.  Since  xt  >  0,  the  vector 
functions  xtI(U  <  u)  and  xtu  are  nondecreasing  in  u.  It  is  easy  to  show 

Yn(s,u)  -  Yn(suui)  <  Yn(s2,u2)  -  Yn(suu2) 

<  u2)  -  u2) 


1    M  /  1      [n«2] 

W/2(-j:xt)(u2-u)  +  n1'2  [-   £  xt{I(Ut 

n  (  =  1  \nt=[n5] 


and 


Yn{suux)-Yn{s,u)<nxl2{-Y,xt){u-u,)  +  nll2(-    jr    xt{I(Ut  <  u)  -  Ul} )  . 

n   t=l  \n  t=[nSi]  J 

The  lemma  follows  from  the  boundedness  of  the  indicator  function  and  (A. 5).       □ 
Remarks:  Bickel  and  Wichura  (1971)  provided  a  general  framework  for  showing  the 
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tightness  of  a  sequence  of  multiparameter  stochastic  processes.  Their  conditions  are 
hard  to  verify  and  probably  do  not  hold  because  of  the  dependence  and  unboundedness 
of  xt.  Although  there  are  empirical  process  theories  for  mixing  and  nonstationary 
variables,  (see  Andrews  and  Pollard  (1990)  and  the  references  therein),  none  of  them 
are  directly  applicable.  Also,  the  presence  of  the  0P(1)  term  in  our  Lemma  A. 2  seems 
to  make  it  necessary  for  us  to  evaluate  directly  the  modulus  of  continuity.  A  direct 
proof  is  also  instructive.  The  arguments  of  Bickel  and  Wichura  inspire  the  ideas  used 
in  the  remaining  proof. 
Proof  of  Theorem  A.l.  Define 

us(Yn)  =  Su?{\\Yn(s',u')-Yn(s",u")\\;  \s'-s"\  <  8,  \u'-u"\  <  8,s',s",u',u"  e  [0,1]}. 

We  shall  show  that  for  any  e  >  0  and  rj  >  0,  there  exist  a  8  >  0  and  an  integer  no, 
such  that 

P{us{Yn)  >  e)  <  n,        n  >  n0. 

Since  [0,  l]2  has  only  about  8~2  squares  with  side  length  8,  it  suffices  to  show  that  for 
every  point  (sx,Ui)  €  [0,  l]2,  every  e  >  0  and  rj  >  0,  there  exist  a  8  €  (0,1)  and  an 
integer  no  such  that 

P (sup  \\Yn(s,u)-Yn{s1,ul)\\  >  5e)  <282t),  n  >  n0.  (28) 

(*) 

where  (8)  =  {(s,  u);  Si  <s  <  si  +  8,  ux  <  u  <  Ui  +  8}  C\  [0,  l]2. 

For  given  8   >   0  and  r)   >   0,  choose  C  large  enough  so  that  for  the  Op(\)  in 
Lemma  A. 2 

P(\Op(l)\  >C)<  82v.  (29) 

By  Lemma  A.2  (see  also  (22.18)  of  Billingsley,  1968,  p.  199),  when  |Op(l)|  <  C, 
sup||yn(s,u)  -  yn(si,"i)||  <  3    max    \\Yn(s<i  +itn,ui  +  jtn)  -  Yn(si,u.\)\\  +  2e 

[6]  l<:j<m 

where  tn  =  c/(n^2C)  and  m  =  [n^2C8/e]  +  1.  Write 

X{i,j)  =  Yn(si  +  ie„,U!  +  jen)  -  Yn(su ui). 
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Then 
P(sup  \\Yn{s,u)-Yn(Sl,Ul)\\  >  5e)  <  P(\0P(1)\  >  C)+P(  max    \\X(i,j)\\  >  e).  (30) 

[S]  l<«,J<m 

Now  for  fixed  i  and  k  (i  >  k)  write  Z(j)  =  X(i,j)  —  X(k,j).  Notice  that 
(e/Cjn-^1'^-1)  <  tl{Cn^)  =  tn<  jen,         j  >  1, 

which  follows  from  n-h-l)/2i"-1)  <  n'1'2  because  1  <  a  <  7.  By  (25)  and  (26), 
E\\Z(j)  -  Z(l)\\2^  <  KC([(i  -  k)en)a[(j  -  l)tn]a,  1  <  /  <  j  <  m 

where,  from  (26)  with  r  =  e/C, 

Ce  =  [1  +  (C/e)2{a-V}  <  2(C/e)2(Q~1)  for  small  e.  (31) 

Thus  by  Theorem  12.2  of  Billingsley  (1968,  p.  94),  we  have 

K  KC  K  C 

P(  max  \\Z(j)\\  >e)<  -^[(i  -  fc)en]Q(men)°  <  -^-i[(i  -  k)en}°6°         (32) 

where  K\  is  a  generic  constant  and  Ki  =  1aK\K.   The  last  inequality  follows  from 
(mt„)  <  26  for  large  n.  Because 

max\\X(iJ)\\-max\\X(kJ)\\   <  max\\X(i,j)  -  X(kJ)\\  =  max||Z(j)||, 

j  j  j  j 

if  we  let  V(i)  =  maxj  ||X(i,_7')||,  then  (32)  implies 

P(|V(i)  -  V(*)|  >  e)  <  ^^[(t  -  fc)e„r<T,        1  <  lb  <  i  <  m. 

Thus  by  Theorem  12.2  of  Billingsley  once  again  [let  £/,  =  V{h)  —  V(h  —  1),  so  that 
V(i)  is  the  partial  sum  Si  of  random  variables  £/,  in  Billingsley's  notation],  we  obtain 

P(  max  |V(i)|  >  e)  <  £Mi(men)^  <  ^2° 

l<i<m  (_i~  £*~ 

where  A' {  is  a  generic  constant  and  A3  =  2°  A{  A'2.  Note  that  max,  |  V(t) |  =max,  maXj  \\X(i,  j) 
Thus  by  (30) 

P(sup  \\Yn(s,u)  -  Yn(suUl)\\  >  56)  <  S2r,  +  ^<$2°. 
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By  (31),  the  second  term  on  the  right  hand  side  above  is  bounded  by 

-^-6     <  6  t2{l+a_x){CS)        '.  (33) 

By  Lemma  A. 2,  one  can  choose  C  =  (M/r))2^1+p)6~{T^)  to  assure  (29)  and  the  left 
hand  side  (33)  becomes  K(t,rj)8a,  where  K(t,Tj)  is  a  constant  and  a  =  ilgrllg  >  0. 
Choose  8  such  that  K(e,r])8a  <  n,  then  (28)  follows.  The  proof  of  the  theorem  is 
completed.       □ 

Corollary  A.l    Under  assumptions  (A. 2),  (A. 3'),  (A.5)-A.6),  the  process  Hn  defined 

as 

[«] 

Hn(s,x)  =  {X'X)-l'2Y,xt{I{et  <  x)  -  F(x)} 

t=i 

converges  weakly  to  a  Gaussian  process  H  with  zero  mean  and  covariance  matrix 

E{H(r,  x)H(s,  y)'}  =  Q(l)-1/2Q(r  A  s)Q(l)-l'2[F(x  Ay)-  F(x)F(y)].  (34) 

Proof.  Hn(s,x)  =  (X'X/n)-^2Yn(s,F(x))  if  one  lets  U{  =  F(e,)-  Since  (X'X/n) 
converges  in  probability  to  the  matrix  Q(l),  the  tightness  of  Hn  follows  from  Theorem 
A.l.  The  finite  dimensional  convergence  to  a  normal  distribution  is  obvious.  To  verify 
the  covariance  matrix,  consider  for  r  <  s  and  u  =  F(x)  <  v  =  F(y)  and  utilize  the 
martingale  property, 

1       f[nr]         \ 
E{Yn(r,u)Y^s,v)}  =  -E    X>x'(    (u-uv)  (35) 

which  tends  to  Q(r)(u  —  uv)  by  (A. 3').       □ 

Corollary  A. 2  Under  the  assumptions  of  the  previous  corollary,  the  process  Vn  de- 
fined as 

Vn(s,x)  =  Hn(s,x)  -  A[ns]Hn(l,x) 

converges  weakly  to  a  Gaussian  process  V  with  mean  zero  and  covariance  matrix 

E{V{r,u)V(s,v)'}  =  {A(r  A  s)  -  A{r)A{s)}{u  Av  -  uv}.  (36) 
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Proof.  The  tightness  of  Vn  follows  from  the  tightness  of  Hn  and  the  convergence  of 
A[na]  to  a  deterministic  matrix  A(s)  uniformly  in  s.  The  limiting  process  of  Vn  is,  by 
Corollary  A.l, 

V{s,x)  =  H(s,x)  -  A(s)H(l,x). 

Now  (36)  follows  easily  from  (34).         □ 

Note  that  (A. 3)  is  a  special  case  of  (A. 3').  When  Q(s)  =  sQ  for  some  Q  >  0,  the 
covariance  matrix  of  V  then  becomes  (r  A  s  —  rs){F(x  A  y)  —  F(x)F(y)}I  where  / 
is  the  p  x  p  identity  matrix,  yielding  a  multivariate  Kiefer  process  with  independent 
components. 

We  next  study  the  asymptotic  behavior  of  the  residual  empirical  process.  Under 
model  (1),  £t  <  z  if  and  only  if  et  <  z  +  x't{$  —  0),  thus  the  residual  s.e.p.  K*  is  given 

by 

[ns] 

K:(s,  z)  =  {X'X)-"2  52  xt{I(et  <  z  +  x't0  -  /3))  -  F(x)}. 

t=i 

Under  the  local  alternative  of  (11),  it  <  z  if  and  only  if 

et  <  z{\  +  A2h(t/n)n-1/2}  +  x't{0  -  0)  +  ^x'.gitln)^1!2}^  +  A2h(t/n)n-^2}. 
Thus  K*  becomes 

[ns] 

K'n(s,z)  =  (X'X)"1/2  ]>>{/(£,  <  z{\  +  atn-"2)  +  btn~ll2)  -  F(x)}  (37) 

t=\ 

where 

at  =  A2h(t/n),  and  bt  =  x't{y/^0  -  P)  +  A,x'fflr(</n)}{l  +  A2/i(i/n)n_1/2}.      (38) 

Choosing  the  weights  xt  =  1  in  (37),  then  K*  is  just  the  non-weighted  s.e.p.  of  residu- 
als. We  shall  introduce  a  more  general  process  that  can  accommodate  all  above  cases, 
and  examine  the  asymptotic  behavior  of  this  general  process. 

Let  a  =  (ai,a2,  ...,an),  b  —  (bi,b2,...,bn)  be  two  1  x  n  random  vectors,  and  C  = 
(ci,C2,  ...,cn)'  be  a  n  x  q  random  matrix  (q  >  1).  Define 

[ns] 

Kn(s,  z,  a,  b)  =  (C'C)-l/2  2  d  {/(£<  <*(!.  +  atn-112)  +  b^1'2)  -  F(z)}  . 


(=i 
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For  ct  =  xuat  —  0,  and  bt  =  x'tn1^2($  —  /?),  or  at  and  bt  in  (38),  we  have 

Kn(s,z,a,b)  =  K*(s,z)      and  moreover,    Kn(s,  2,0,0)  =  Hn(s,  z).  (39) 

Define 

Zn(s, «,  a,  6)  =  4='E  c<  l7^  ^  2(!  +  a^_1/2)  +  ^_1/2)  -  FW  +  atn-1'2)  +  btn~ll2)\ 
Vn  t=l 

Assume 

(B.l)  The  variable  et  is  independent  of  J-t-\,  where 

Tt-i  =  cr  —  field{as+i,6s+i,cs+i,es;  s  <  t  —  1}. 

(B.2)n^Y^i\\ct\\  =  Op(l). 

(B.3)  n"1/2  maxi<,<„  \r]t\  =  op(l),  for  m  =  a,,  6,-. 

(B.4)  There  exist  a  7  >  1  and  /I  <  00  such  that  for  all  n 

£{-  E  N|2(M  +  |fc|)F  <  A      and    -X>{||q||2(H  +  |6t|)}7  <  A 

(B.5)  Condition  (B.3)  and  (B.4)  with  |6<|  replaced  by  ||xt||. 

Note  that  under  (B.l)  the  summands  in  Zn  are  conditionally  centered. 

Theorem  A. 2   Under  the  assumptions  of  (A.l),  (B.1)-(B.5) 
Kn(s,z,a,b)  =  Kn(s,z,  0,0)  + 

(cc/n)-1/2  {/(*)*  f  ^ E c<G<)  +  /(*)  (^ E c<M  J  +  <*U) 

where  the  op(l)  is  uniform  in  s  and  in  z,  and  for  bt  =  x'ta,  the  op(l)  is  a/so  uniform 
in  a  £  D,  an  arbitrary  compact  set  of  HP .  In  particular,  the  result  holds  for  bt  = 
x'tn^20  -  0)  as  long  as  nl'20  -(3)  =  Op(l). 

Proof:  By  adding  and  subtracting  terms, 

Kn(s,z,a,b)  =  A'n(s,z,0,0) 

+Zn{s,  z,  a,  b)  -  Zn(s,  z,  0,  0) 

+  (C'C)-^2  £  ct  {F(z(\  +  atn-"2)  +  btn-"2)  -  F(z)}  . 


t=\ 
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Theorem  A. 2  now  follows  from  Theorem  A.3(i)  and  (ii)  below  and  the  Taylor  series 
expansion.         □ 

Theorem  A. 3   (i)  Under  assumptions  (A.l)  and  (B.1)-(B.4), 

sup      \\Zn(s,z,a,b)  -  Zn(s,z, 0,0)\\  =  op(l). 
o<s<i,zeR 

(ii)  Let  bt  =  x'ta  for  a  in  a  compact  set  D  of  Rv  and  denote  b(a)  =  (x[a,  ...,x'na). 
Then  under  assumptions  (A.l)  and  (B.l),  (B.2)  and  (B.5) 

sup      sup      \\Zn(s,z,a,b{a))  -  Zn(s,z, 0,0)||  =  op(l). 

aeDO<s<l,zeR 

(Hi)  Let  at  =  r'tr;  rt,r  G  Re  for  some  £  >  1;  r  G  S,  a  compact  set.  Denote  a(r)  = 
(r[T,...,r'nT).  Assume  (B.3)  and  (B.4)  hold  with  \at\  =  \\rt\\.  Then  under  (A.l),  (B.l) 
and  (B.2) 

sup  sup      sup       \\Zn(s,  z,  a(r),  b(a))  -  Zn(s,  z,  0,0)||  =  op(l). 

reS  a€DO<s<l,zeR 

Note  that  part  (i)  is  a  special  case  of  part  (ii).  Similarly,  (ii)  is  a  special  case  of  (iii). 
However,  each  of  the  latter  is  also  a  consequence  of  its  former,  as  will  be  shown.  Part 
(ii)  allows  bt  to  depend,  in  a  particular  way,  on  the  entire  data  set.  An  example  is 
6,  =  x\y/n{$  —  /?)  as  long  as  y/n($  —  /?)  =  Op(l).  Similarly,  part  (iii)  allows  scale 
parameter  to  be  estimated.  In  our  application,  part  (ii)  is  all  that  is  required.  To 
prove  the  theorem,  we  need  the  following  lemma. 

Lemma  A. 3    Under  assumption  (A.l)  and  (B.l)-(B.Ji),  for  every  d  £  (0, 1/2) 

suP4=£l|c,F(y;)-ctJF(z(*)||=op(l) 

where  y*  =  y{\  +  atn~1/2)  +  btn~ll2 ,  z'  —  z{\  +  atn~1^2)  +  6(n-1/2  and  the  supremum 
extends  over  all  pair  o/(y,z)  such  that  \F(y)  —  F(z)\  <  n~1'2~d. 

Proof:  Follows  from  the  mean  value  theorem.        □ 

Proof  of  (i).    Let  N(n)  be  an  integer  such  that  N(n)  =  [n1/2+d]  +  1,  where  d  is 
defined  in  Lemma  A. 3.  Following  the  arguments  of  Boldin  (1982),  divide  the  real  line 

32 


into  N(n)  parts  by  points  — oo  =  z0  <  z\  <  ■  ■  ■  <  ZN(n)  =  oo  with  F(z,)  =  iiV(n)-1. 
As  explained  in  the  proof  of  Lemma  A. 2,  there  is  no  loss  of  generality  by  assuming 
Cj  >  0.  Then  ctI(et  <  z)  and  ctF(z)  are  nondecreasing.  Thus  when  zr  <  z  <  xr+1,  we 
have 

Zn(s,z,a,b)  -  Zn(s,z,  0,0) 

<  Zn(s,zT+i,a,b)  -  Zn(s,zT+1, 0,0) 
1     [™1 

+  ~7=  E  C^7(£/   <   *r+l)  -  ^r+l)  -  /(£(   <  Z)  +  F(Z)} 

5=5>{F(zr+1(l  +  a^"1/2)  +  btn-1'2)  -  F(z(l  +  atn~^2)  +  b^1'2)}. 


The  reverse  inequality  holds  when  zr+1  is  replaced  by  zT.  Therefore,  by  the  inequality 
\y\  <  max(|c|,  \d\)  for  c  <  y  <  d, 

sup||Zn(s,z,a,6)  -  Zn(s,  z,  0,0)||  <  maxsup  ||Zn(5,zr,a,  6)  -  Zn(s,  zr,0,0)|| 

I 

1 


+         sup 

s,|ii-v|</V(n)-l    \/n 


1 


+  sup     _ 
5     y/n 


£ct{F(zr+1(l  +  a^-1/2)  +  btn-'l2)  -  F(zT(l  +  a^-1'2)  +  btn^2)} 


t=i 


Because  ||  e!=1  -||  <  E?=i  II  "  II  and  \F{zr+i)  -  F(zr)\  <  n~ll2-d  by  construction,  the 
last  term  on  the  right  is  op(l)  by  Lemma  A. 3.  The  second  last  term  is  op(l)  because 
of  Theorem  A.l.  It  remains  to  show 


nJ^,  m^x  HZnOVn'^)ll  =  oP(1) 

0<r<N(n)  l<J<n 

where  Z*(j/n,zT)  :=  Zn(s,zr,a,b)  -  Zn(s,zT, 0,0).  But 


(40) 


p(    max      max  ||Z;(j/n,2r)ll  >  0  < -W(n)  max  P(max  \\Z*{j/n,  zT)\\  >  c). 

0<r<N(n)  l<J<n  T  J 

The  remaining  task  is  to  bound  the  above  probability.  Let 

£<  =  (hi  I  Ut  <  zr{\  +  -j=at)  +  -J=6(  J  -  F  (zT(l  +  -^a<)  +  -j=M  -  /(e,  <  zr)  +  F{zT) 
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then  {it,  Ft)  is  an  array  of  martingale  differences  and 

t=\ 
By  the  Doob  inequality, 

PCmaxHn-1/2^!!  >  «)  <  e^A^n-1'2^!*,  (41) 

3  t=i  t=i 

where  M\  is  a  constant  only  depending  on  p  and  7.  By  the  Rosenthal  inequality  (Hall 
and  Heyde,  1980,  p.  23),  there  exists  M2  >  0,  such  that 

^(iiE6ii)27<M2^{x:^(ii6ii2i^-i)}7+M2f:Eii6ii^       (42) 

t=\  t=l  t=\ 

for  all  n.  Because  (a,,  &,,  c,)  is  measurable  with  respect  to  ^i_i  and  £,■  is  independent 
of  Ti.x  by  (B.l), 

J5(||6||2|^-i)  <  \\ci\\2{F(zr(l  +  ain-"2)  +  6,-n"1/2)  -  F(zr)}  <  -L|M|2£(H  +  |6,|) 

where  L  is  an  upper  bound  for  both  |/(x)|  and  |x/(x)|  for  all  x.    Using  the  above 
inequality  and  £||6||27  =  £{£(||6'||27|^-i)},  we  have 

^||6H2'y<n-7/2^^{||c,||2(|al|  +  |6:|)r. 

By  (42),  for  M3  =  M2L^, 

^-1/2HE6||)27     <     M3n-^E{-±\\ct\\2(\at\  +  \bt\)r 
t=\  n  (=1 

+     M3n-^2-^-^-J2E{\\ct\\2(\at\  +  \bt\)r 
nt=\ 

<    2M3An"t/2. 

The  last  inequality  follows  from  assumption  (B.4).  The  above  bound  does  not  depend 
on  zT.  Thus  for  M4  =  2MtM3A, 

P(maxmax|Z;(;/n,2r)|  >  c)  <  t~2'yM4N(n)n-''/2  =  t~21 MAn-{^)l2+d 
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because  N(n)  =  nl^2+d.  The  above  is  o(l)  if  we  choose  d  £  (0,(7  —  l)/2)  in  Lemma 
A. 3.  The  proof  of  (i)  is  completed.        □ 

Proof  of  (ii).  This  really  follows  from  the  compactness  of  D.  The  proof  is 
standard,  see  Koul  (1991),  for  example.  Since  D  is  compact,  for  any  8  >  0,  the  set  D 
can  be  partitioned  into  finite  number  of  subsets  such  that  the  diameter  of  each  subset 
is  not  greater  than  8.  Denote  these  subsets  by  D\,  D2,  ...,Dm^)-  Fix  k  and  consider 
Dk.  Pick  ake  Dk.  For  all  a  £  Dk 

{x'tak  -  S\\xt\\)  <  x'ta  <  (x'tak  +  8\\xt\\) 

because  ||ajt  —  q||  <  8.  Thus  if  we  define  the  vector  b(k,  A)  =  (x\ak  +  A||xi||,  ...,x'nak  + 
A||xn||)  then  assuming  again  ct  >  0  for  all  t,  we  have  for  all  0:  £  Dk,  by  the  monotonicity 
of  ctI(et  <  z), 

Zn(s,  z,  a,  6(a))  <  Zn(s,  z,  a,  b(k,  8))  + 

1     ln5l 
^  E  c<  {F  (z(l  +  a<"~1/2)  +  ( W  +  *||xt||)n-1/2)  -  F  (z(l  +  a.n"1/2)  +  xjan"1/2)} 

and  a  reversed  inequality  holds  when  6  is  replaced  by  —8.  Using  the  mean  value 
theorem  and  assumption  (A.l),  it  is  easy  to  verify  that  the  second  term  on  the  right 
is  bounded  (with  respect  to  the  norm  ||  •  ||)  by  80p(l),  where  the  0P(1)  is  uniform  in 
all  s  £  [0, 1],  all  z  £  R,  and  all  a  £  D.  Thus 

supsup||Zn(s,2,a,i(o))  -  Zn(s,z, 0,0)|| 

a       s,z 

<  max  sup  ||Z„(s,  z,  a,  b(k,  8))  —  Zn(s,  z,  0, 0)|| 

k         s,z 

+  maxsup||Zn(s,z,a,6(fc,  -8))  -  Zn(s,  2, 0,0) ||  +  80p(l) 

«  3,2 

where  the  supremums  are  taken  over  a  £  D,  s  £  [0, 1],  z  £  R,  and  k  <  m(8), 
respectively.  The  term  <50p(l)  can  be  made  arbitrarily  small  in  probability  by  choosing 
a  small  8.  Once  8  is  chosen,  m{8)  will  be  a  bounded  integer.  The  first  two  terms  on 
the  right  hand  side  are  then  op(l)  by  part  (i).        □ 

Proof  of  (iii).  Follows  from  the  same  type  of  arguments  as  in  the  proof  of  (ii). 
Instead  of  using  the  result  of  part  (i),  one  uses  the  result  of  part  (ii).  The  proof  of  the 
theorem  is  now  completed.        □ 
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We  now  in  the  position  to  prove  Theorems  1  through  4.  Conditions  required  for 
the  preliminary  results,  Theorem  A.l  to  Theorem  A. 3  and  their  corollaries,  are  all 
satisfied  under  (A.1)-(A.9)  for  various  choices  of  at,bt  and  ct  below.  Conditions  (A. 3) 
and  (A. 9)  can  be  replaced  by  (A. 3')  when  weighted  empirical  processes  are  under 
consideration. 

Proof  of  Theorem  1  and  Theorem  3.  Under  the  null  hypothesis,  it  =  et  —  x't(f3  —  /3) 
so  it  <  z  if  and  only  if  et  <  z  +  x't(fi  —  /?).  Apply  Theorem  A. 2  with  at  —  0,bt  = 
x'ty/n(J3  —  /?),  and  ct  =  xt\  in  view  of  (39), 


K'n(s,z)  -  A[ns]K'n(l,z)  =  Hn(s,z)  -  A[ns]Hn(l,z)  (43) 

1    M  •)      n 

+f(z)(X'X/n)-^-J2^bt  -  f(z)A[ns](X'X/n)-^2-J2xtbt  (44) 

n  t=i  n  t=i 

+op(l).  (45) 


Expression  (44)  is  identically  zero  for  all  s  G  [0, 1]  when  bt  =  x'ty/n(/3  —  /?).  That  is, 
the  drift  terms  of  K*(s,z)  and  A[ns]K*(l,z)  are  canceled  out.  Theorem  3  now  follows 
from  Corollary  A. 2.  Theorem  l(ii)  follows  as  a  special  case.  To  prove  Theorem  l(i), 
take  xt  =  1  and  A[ns]  =  [ns]/n  in  the  above  proof,  then  (44)  becomes 

i  (nsl  r    i  n  / 1  tnsl         r    i  n     \ 

/(^EWW— E*«  =  /W  iE»t-  —  Z*t)yft0-f>),      (46) 
nt=i  n  t=i  \nt=i       n  t=i  j 

which  is  op(l)  under  assumptions  (A. 7)  and  (A. 9).  The  limiting  process  of  Hn(s,z)  — 
A[ns]Hn(l,  z)  reduces  to  the  one  stated  in  Theorem  l(i)  when  xt  =  1  for  all  t.  □ 
Proof  of  Theorem  2.  Under  the  local  alternatives  (11),  K*  is  given  by  (37)  with  at 
and  bt  given  by  (38).  Note  that  under  these  local  alternatives,  the  root-n  consistency  of 
$  generally  prevails.  For  example,  assuming  the  et  have  a  finite  variance,  least  squares 
estimator  of  (3  is  still  root-n  consistent.  The  root-n  consistency  allows  us  to  obtain 
a  non-explosive  limit  (otherwise  the  tests  will  be  consistent  even  for  local  changes). 
Note  that  bt  is  dominated  by  x'ty/n(/3  —  l3)  +  Aix'tg(t/n),  with  the  remaining  term  being 
negligible  in  the  limit.  Moreover,  when  bt  =  x'(V/n(^  —  /?),  from  the  previous  proof,  the 
drift  term  of  A'*(s,  z)  —  A[nj]A'*(l,  z)  is  negligible  for  either  tt  =  xt  or  ct  =  1.  We  can 

36 


thus  assume  bt  =  Aix'tg(t/n).  Let  ct  =  xt.  Now  by  Theorem  A. 2,  for  at  =  A2h(t/n) 

K*(s,z)  -  A[ns]ICn(l,z)  =  Hn(s,z)  -  A[na]Hn(l,z) 

+f(z)zA2  }(  — )-1/2-E^M  -  AM(— )-1/2-f:xth(t/n))      (47) 
[      n  n  t=\  n  n  t=\  J 

{X'X           1   'ns'                                     X'X  1     "  1 

( )-l/2-Z^x't9(t/n)-A[ns]( )-^-j:xtx't9(t/n)    (48) 
n             nt=1                                     n  nf^  J 

+oP(l) 

By  the  results  of  KPA,  under  (A.3)  and  (A.9) 

1  [ns]  r 

-  J2  xth(t/n)  -A  x  /    %)du,  (49) 

71  (=1  ^ 

1  tni]  rs 

-Y,xtx'tg(t/n)  ^  Q  /   g(v)dv.  (50) 

Furthermore,  under  assumption  (A.3), 

A[ns](X'X/n)-^  -2,  sQ-x'\  (51) 

From  these  results,  (47)  converges  to  f(z)zA2Q~1^2x\il(s)  where  A^  is  given  by  (13); 
and  similarly,  (48)  converges  to  f(z)A\Q1^2Xg(s)  where  A5  is  given  by  (12).  Thus  (15) 
is  obtained  and  (14)  is  obtained  similarly  by  choosing  ct  =  1.  The  proof  is  completed. 
□ 

Proof  of  (18)  and  (19).  Now  £t  <  z  if  and  only  if  et  <  x't0  -  /?)  +  Az'tg{t/n)n-^2. 
Let  at  =  0,  bt  =  x'ty/n(/3  —  (3)  +  Az'tg(t/n).  Again  we  can  ignore  x'ty/n(f3  —  (3)  in  bt  and 
assume  bt  =  Az'tg(t/n).  For  ct  =  1  or  ct  =  x(,  the  drift  term  of  A'*(s,  z)  —  A[ns]K*(l,z) 
is  given  by 

{n<n  i  Inj]  C'C  1     n 

(  —  )-i/2-E*<9(t/n)  -  AM(  —  yl/2-E^'Mt/n) 
n  n  t=i  n  n  (=1 

plus  an  op(l)  term.  Finally,  if  ct  =  1  then  (18)  follows;  and  if  ct  =  xt,  then  (19) 
follows.         □ 
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Proof  of  Theorem  4.  The  proof  is  virtually  identical  to  that  of  Theorem  2,  except 
under  (A. 3'),  (49)-(51)  are  replaced  by 

[ns] 


—  >    xtn(t/n)  — >   / h(v)dv, 

n  jT[  Jo        dv 

-2^xtxtg{t/n)  -»  /    —j — g{v)dv, 

A[ns](x'x/n)-^2  -*>  q(i)->/2q(s)q(i)-\ 

respectively,  where  Q(v)e  is  the  first  column  of  Q(v).  The  first  convergence  is  a  special 
case  of  the  second  due  to  the  presence  of  a  constant  regressor.         □ 
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